32 research outputs found

    Evaluating the forensic importance of glottal source features through the voice analysis of twins and non-twin siblings

    Get PDF
    In this study we have analyzed 853 tokens of the vowel filler [ei], extracted from spontaneous speech fragments of 54 male Spanish speakers (NorthCentral Peninsular variety), each one recorded on two separate sessions. The speakers — to be compared in a pairwise fashion - were divided in four groups: 24 monozygotic (MZ) twins, 10 dizygotic (DZ) twins, 8 non-twin brothers and 12 unrelated speakers. From the extracted vowel fillers, considered long enough for a glottal analysis (around 160 milliseconds), a vector of 68 glottal parameters was created. Our hypothesis that higher similarity values would be found in the intra-pair comparison ofMZ twins than in DZ twins, brothers or unrelated speakers was confirmed, which suggests that the glottal parameters under investigation are genetically influenced. This finding seems of great forensic importance, as a phonetic parameter is considered forensically robust provided that it exhibits large between-speaker variation while it remains as consistent as possible for each speaker (i.e. small within-speaker variation)

    CIVIL Corpus: Voice Quality for Speaker Forensic Comparison

    Get PDF
    AbstractThe most frequent way in which criminals disguise their voices implies changes in phonation types, but it is difficult to maintain them for a long time. This mechanism severely hampers identification. Currently, the CIVIL corpus comprises 60 Spanish speakers. Each subject performs three tasks: spontaneous conversation, carrier sentences and reading, using modal, falsetto and creak(y) phonation. Two different recording sessions, one month apart, were conducted for each speaker, who was recorded with microphone, telephone and electroglottography. This is the first (open-access) corpus of disguised voices in Spanish. Its main purpose is finding biometric traces that remain in voice despite disguise

    Exploring pause fillers in conversational speech for forensic phonetics: findings in a Spanish cohort including twins

    Get PDF
    Pause fillers occur naturally during conversational speech, and have recently generated interest in their use for forensic applications. We extracted pause fillers from conversational speech from 54 speakers, including twins, whose voices are often perceptually similar. Overall 872 tokens of the sound [e:] were extracted (7-33 tokens per speaker), and objectively characterised using 315 acoustic measures. We used a Random Forest (RF) classifier and tested its performance using a leaveone- sample-out scheme to obtain probabilistic estimates of binary class membership denoting whether a query token belongs to a speaker. We report results using the Receiver Operating Characteristic (ROC) curve, and computing the Area Under the Curve (AUC). When the RF was presented with at least 20 tokens in the training phase for each of the two classes, we observed AUC in the range 0.71-0.98. These findings have important implications in the potential of pause fillers as an additional objective tool in forensic speaker verification

    Reconocimiento automático de locutor con hermanos españoles: hermanos gemelos (monozigóticos y dizigóticos) y no gemelos

    Get PDF
    The performance of the automatic speaker recognition (ASR) system BatvoxTM (Version 4.1) has been tested with a male population of 24 monozygotic (MZ) twins, 10 dizygotic (DZ) twins, 8 non-twin siblings and 12 unrelated speakers (aged 18–52 with Standard Peninsular Spanish as their mother tongue). Since the cepstral features in which this ASR system is based depend largely on anatomical–physiological foundations, we hypothesized that such features ought to be gene-dependent. Therefore, higher similarity values should be found in MZ twins (100% shared genes) than in DZ twins, in brothers (B) or in a reference population of unrelated speakers (US). Results corroborated the expected decreasing scale MZ > DZ > B > US since the similarity coefficients yielded by the automatic system for these speakers decreased exactly in the same direction as the kinship degree of the four speaker groups diminishes. This suggests that the system features are to a great extent genetically conditioned and that they are hence useful and robust for comparing speech samples of known and unknown origin, as found in legal cases. Furthermore, the 9.9% EER (Equal Error Rate) obtained when testing MZ pairs lies around the same value (11% EER) found in Künzel (2010) with German twins.Hemos utilizado el sistema de reconocimiento automático BatvoxTM (versión 4.1) con una población de hablantes masculinos compuesta de 24 gemelos monocigóticos, 10 gemelos dicigóticos, 8 hermanos no gemelares y 12 hablantes no emparentados (edades comprendidas entre 18 y 52 años, con español centropeninsular como lengua materna). Puesto que los parámetros cepstrales en los que se basa BatvoxTM dependen en gran medida de las bases anatómicas y fisiológicas del tracto vocal del hablante, se propuso que estos debían estar influenciados genéticamente. Esta hipótesis se pudo corroborar, puesto que los coeficientes de similitud arrojados por el sistema automático decrecen exactamente en la misma dirección en la que disminuye el grado de parentesco de las parejas de hablantes, es decir: gemelos monocigóticos, dicigóticos, hermanos no gemelares y hablantes no emparentados. Esto es, los gemelos monocigóticos obtuvieron valores más altos que los dicigóticos; estos, a su vez, mayores que los hermanos no gemelares, y, finalmente, estos últimos mayores que los hablantes no emparentados. Estos resultados sugieren que los parámetros en los que está basado este sistema de reconocimiento están condicionados en gran medida por aspectos genéticos y, por tanto, resultan útiles y robustos para la comparación de muestras de voz dubitadas e indubitadas que encontramos en un caso típicamente forense. Por otro lado, el EER (Equal Error Rate) del 9 % que se obtuvo en las comparaciones exclusivamente de gemelos monocigóticos supone un valor muy similar al hallado en estudios anteriores con gemelos monocigóticos alemanes, como Künzel (2010): EER del 11 %

    Isolation and characterization of a potato cDNA corresponding to a 1-aminocyclopropane-1-carboxylate (ACC) oxidase gene differentially activated by stress

    Get PDF
    3 pages, 3 figures.-- PMID: 12432039 [PubMed].1-Aminocyclopropane-1-carboxylate (ACC) oxidase enzyme catalyses the final step in ethylene biosynthesis, converting 1-aminocyclopropane-1-carboxylic acid to ethylene. A cDNA clone encoding an ACC oxidase, ST-ACO3, was isolated from potato (Solanum tuberosum L.) by differential screening of a Fusarium eumartii infected-tuber cDNA library. The deduced amino acid sequence exhibited similarity to other ACC oxidase proteins from several plants species. Northern blot analysis revealed that the ST-ACO3 mRNA level increased in potato tubers upon inoculation with F. eumartii, as well as after treatment with salicylic acid and indole-3-acetic acid, suggesting a cross-talk between different signalling pathways involved in the defence response of potato tubers against F. eumartii attack.This work was partially supported by the IFS (Sweden), CONICET, UNMDP, ANPCyT (Grant No. 01-09768), and FundacioÂn Antorchas (Argentina). MEZ and CT were recipients of a fellowship from CONICET and UNMDP, respectively.Peer reviewe

    Euclidean distances as measures of speaker similarity including identical twin pairs: a forensic investigation using source and filter voice characteristics

    Get PDF
    AbstractThere is a growing consensus that hybrid approaches are necessary for successful speaker characterization in Forensic Speaker Comparison (FSC); hence this study explores the forensic potential of voice features combining source and filter characteristics. The former relate to the action of the vocal folds while the latter reflect the geometry of the speaker’s vocal tract. This set of features have been extracted from pause fillers, which are long enough for robust feature estimation while spontaneous enough to be extracted from voice samples in real forensic casework. Speaker similarity was measured using standardized Euclidean Distances (ED) between pairs of speakers: 54 different-speaker (DS) comparisons, 54 same-speaker (SS) comparisons and 12 comparisons between monozygotic twins (MZ). Results revealed that the differences between DS and SS comparisons were significant in both high quality and telephone-filtered recordings, with no false rejections and limited false acceptances; this finding suggests that this set of voice features is highly speaker-dependent and therefore forensically useful. Mean ED for MZ pairs lies between the average ED for SS comparisons and DS comparisons, as expected according to the literature on twin voices. Specific cases of MZ speakers with very high ED (i.e. strong dissimilarity) are discussed in the context of sociophonetic and twin studies. A preliminary simplification of the Vocal Profile Analysis (VPA) Scheme is proposed, which enables the quantification of voice quality features in the perceptual assessment of speaker similarity, and allows for the calculation of perceptual–acoustic correlations. The adequacy of z-score normalization for this study is also discussed, as well as the relevance of heat maps for detecting the so-called phantoms in recent approaches to the biometric menagerie

    The individual and the system : Assessing the stability of the output of a semi-automatic forensic voice comparison system

    Get PDF
    Semi-automatic systems based on traditional linguistic-phonetic features are increasingly being used for forensic voice comparison (FVC) casework. In this paper, we examine the stability of the output of a semi-automatic system, based on the long-term formant distributions (LTFDs) of F1, F2, and F3, as the channel quality of the input recordings decreases. Cross-validated, calibrated GMM-UBM log likelihood-ratios (LLRs) were computed for 97 Standard Southern British English speakers under four conditions. In each condition the same speech material was used, but the technical properties of the recordings changed (high quality studio recording, landline telephone recording, high bit-rate GSM mobile telephone recording and low bit-rate GSM mobile telephone recording). Equal error rate (EER) and the log LR cost function (Cllr) were compared across conditions. System validity was found to decrease with poorer technical quality, with the largest differences in EER (21.66%) and Cllr (0.46) found between the studio and the low bit-rate GSM conditions. However, importantly, performance for individual speakers was affected differently by channel quality. Speakers that produced stronger evidence overall were found to be more variable. Mean F3 was also found to be a predictor of LLR variability, however no effects were found based on speakers’ voice quality profiles

    Using dysphonic voice to characterize speaker's biometry

    Get PDF
    Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading

    Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing

    Get PDF
    In forensic voice comparison, there is increasing focus on the integration of automatic and phonetic methods to improve the validity and reliability of voice evidence to the courts. In line with this, we present a comparison of long-term measures of the speech signal to assess the extent to which they capture complementary speaker-specific information. Likelihood ratio-based testing was conducted using MFCCs and (linear and Mel-weighted) long-term formant distributions (LTFDs). Fusing automatic and semi-automatic systems yielded limited improvement in performance over the baseline MFCC system, indicating that these measures capture essentially the same speaker-specific information. The output from the best performing system was used to evaluate the contribution of auditory-based analysis of supralaryngeal (filter) and laryngeal (source) voice quality in system testing. Results suggest that the problematic speakers for the (semi-)automatic system are, to some extent, predictable from their supralaryngeal voice quality profiles, with the least distinctive speakers producing the weakest evidence and most misclassifications. However, the misclassified pairs were still easily differentiated via auditory analysis. Laryngeal voice quality may thus be useful in resolving problematic pairs for (semi-)automatic systems, potentially improving their overall performance
    corecore